28 research outputs found
Recommended from our members
Social Network Extraction from Text
In the pre-digital age, when electronically stored information was non-existent, the only ways of creating representations of social networks were by hand through surveys, inter- views, and observations. In this digital age of the internet, numerous indications of social interactions and associations are available electronically in an easy to access manner as structured meta-data. This lessens our dependence on manual surveys and interviews for creating and studying social networks. However, there are sources of networks that remain untouched simply because they are not associated with any meta-data. Primary examples of such sources include the vast amounts of literary texts, news articles, content of emails, and other forms of unstructured and semi-structured texts.
The main contribution of this thesis is the introduction of natural language processing and applied machine learning techniques for uncovering social networks in such sources of unstructured and semi-structured texts. Specifically, we propose three novel techniques for mining social networks from three types of texts: unstructured texts (such as literary texts), emails, and movie screenplays. For each of these types of texts, we demonstrate the utility of the extracted networks on three applications (one for each type of text)
Predicting Interests of People on Online Social Networks
We introduce a new data set which contains both a self-declared friendship network and self-chosen attributes from a finite list defined by the social networking site. We propose Gaussian Field Harmonic Functions (GFHF), a state-of-the-art graph transduction algorithm, as a novel way of testing the relevance of the friendship network for predicting individual attributes. We show that the underlying self-declared friendship network allows us to predict some but not all attributes. We use Support Vector Machines (SVM) in conjunction with GFHF to show that other attributes such as age or languages spoken are also important
An Email Attachment is Worth a Thousand Words, or Is It?
There is an extensive body of research on Social Network Analysis (SNA) based
on the email archive. The network used in the analysis is generally extracted
either by capturing the email communication in From, To, Cc and Bcc email
header fields or by the entities contained in the email message. In the latter
case, the entities could be, for instance, the bag of words, url's, names,
phones, etc. It could also include the textual content of attachments, for
instance Microsoft Word documents, excel spreadsheets, or Adobe pdfs. The nodes
in this network represent users and entities. The edges represent communication
between users and relations to the entities. We suggest taking a different
approach to the network extraction and use attachments shared between users as
the edges. The motivation for this is two-fold. First, attachments represent
the "intimacy" manifestation of the relation's strength. Second, the
statistical analysis of private email archives that we collected and Enron
email corpus shows that the attachments contribute in average around 80-90% to
the archive's disk-space usage, which means that most of the data is presently
ignored in the SNA of email archives. Consequently, we hypothesize that this
approach might provide more insight into the social structure of the email
archive. We extract the communication and shared attachments networks from
Enron email corpus. We further analyze degree, betweenness, closeness, and
eigenvector centrality measures in both networks and review the differences and
what can be learned from them. We use nearest neighbor algorithm to generate
similarity groups for five Enron employees. The groups are consistent with
Enron's organizational chart, which validates our approach.Comment: 12 pages, 4 figures, 7 tables, IML'17, Liverpool, U
Internet of Things and its enhanced data security
The Internet of Things (IoT), an emerging global Internet-based technical architecture facilitating the exchange of information, goods and services in the internet world has an impact on the security and privacy of the involved stakeholders. Measures ensuring the architectures resilience to attacks, data authentication, and access control and client privacy need to be established. This paper includes a survey of IoT and various security issues related to it. Furthermore, out of all security issues, concern over data authentication and transfer is taken into consideration. Here we will discuss the idea for two levels of security in form of two different approaches i.e. Advance Encryption Standards (AES) and the Steganography approach via an image and the simulating of these two logics in the MATLAB
Annotation scheme for social network extraction from text
Abstract We are interested in extracting social networks from text. We present a novel annotation scheme for a new type of event, called social event, in which two people participate such that at least one of them is cognizant of the other. We compare our scheme in detail to the ACE scheme. We perform a detailed analysis of interannotator agreement, which shows that our annotations are reliable
Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM
Sentiment analysis on large-scale social media data is important to bridge
the gaps between social media contents and real world activities including
political election prediction, individual and public emotional status
monitoring and analysis, and so on. Although textual sentiment analysis has
been well studied based on platforms such as Twitter and Instagram, analysis of
the role of extensive emoji uses in sentiment analysis remains light. In this
paper, we propose a novel scheme for Twitter sentiment analysis with extra
attention on emojis. We first learn bi-sense emoji embeddings under positive
and negative sentimental tweets individually, and then train a sentiment
classifier by attending on these bi-sense emoji embeddings with an
attention-based long short-term memory network (LSTM). Our experiments show
that the bi-sense embedding is effective for extracting sentiment-aware
embeddings of emojis and outperforms the state-of-the-art models. We also
visualize the attentions to show that the bi-sense emoji embedding provides
better guidance on the attention mechanism to obtain a more robust
understanding of the semantics and sentiments
Assessment of Anti-aging Efficacy of the Master Antioxidant Glutathione
A chief tripeptide antioxidant Glutathione (GSH) is present inside each body cell which may have a profound effect in the control of aging. The anti-aging potency of GSH and its role towards the progression of certain age-related disease is still unclear. Glutathione based articles were searched on PubMEd database since the very first study of glutathione related to its discovery in 1923 to its present status till 2016. The data was made more informative and precise by searching glutathione relevant reports on google. Those articles were selected which were indicating the association of glutathione with the progression of age-related diseases, pre-clinical and clinical studies and age-longevity effect. It was analyzed that the increased oxidative stress (elevated GSSG/GSH ratio) is responsible for the incidence of age-related diseases and different organs failure. The glutathione redox ratio (GSSG/GSH) was found to be more pro-oxidizing with aging which plays a chief role for the generation of reactive oxygen species (ROS) and subsequently damages the macromolecular structures affecting the normal body mechanisms and functions. The clinical data has recommended that glutathione is a potent therapeutic agent for the control of age-related diseases and experimental analysis has confirmed its prominent effect in age-longevity
Cytopathology Using High Resolution Digital Holographic Microscopy
We summarize a study involving simultaneous imaging of cervical cells from Pap-smear samples using bright-field and quantitative phase microscopy. The optimization approach to phase reconstruction used in our study enables full diffraction limited performance from single-shot holograms and is thus suitable for reducing cost of a quantitative phase microscope system. Over 48000 cervical cells from patient samples obtained from three clinical sites have been imaged in this study. The clinical sites used different sample preparation methodologies and the subjects represented a range of age groups and geographical diversity. Visual examination of quantitative phase images of cervical cell nuclei show distinct morphological features that we believe have not appeared in the prior literature. A PCA based analysis of numerical parameters derived from the bright-field and quantitative phase images of the cervical cells shows good separation of superficial, intermediate and abnormal cells. The distribution of phase based parameters of normal cells is also shown to be highly overlapping among different patients from the same clinical site, patients across different clinical sites and for two age groups (below and above 30 years), thus suggesting robustness and possibility of standardization of quantitative phase as an imaging modality for cell classification in future clinical usage